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Abstract 



H 

"t^ ' We apply the stochastic approximation method to construct a large class of recursive kernel 

estimators of a probability density, including the one introduced by Hall and Patil (1994). 
We study the properties of these estimators and compare them with Rosenblatt's nonrccursive 
estimator. It turns out that, for pointwise estimation, it is preferable to use the nonrccursive 
Rosenblatt's kernel estimator rather than any recursive estimator. A contrario, for estimation by 

'^ , confidence intervals, it is better to use a recursive estimator rather than Rosenblatt's estimator. 
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1 Introduction 

The advantage of recursive estimators on their nonrecursive version is that their update, from a 
sample of size n to one of size n + 1, requires considerably less computations. This property is 
particularly important in the framework of density estimation, since the number of points at which 
the function is estimated is usually very large. The first recursive version of Rosenblatt's kernel 
density estimator - and the most famous one - was introduced by Wolwerton and Wagner (1969), 
and was widely studied; see among many others Yamato (1971), Davies (1973), Devroye (1979), 
Wegman and Davies (1979) and Roussas (1992). Competing recursive estimators, which may be 
regarded as weighted versions of Wolwerton and Wagner's estimator, were introduced and studied 
by Deheuvels (1973), Wegman and Davies (1979) and Duflo (1997). Recently, Hall and Patil (1994) 
defined a large class of weighted recursive estimators, including all the previous recursive estima- 
tors. In this paper, we apply the stochastic approximation method to define a class of recursive 
kernel density estimators, which includes the one introduced by Hall and Patil (1994). 

The most famous use of stochastic approximation algorithms in the framework of nonparametric 
statistics is the work of Kiefer and Wolfowitz (1952), who build up an algorithm which allows the 
approximation of the maximizer of a regression function. Their well-known algorithm was widely 
discussed and extended in many directions (see, among many others, Blum (1954), Fabian (1967), 
Kushner and Clark (1978), Hall and Heyde (1980), Ruppert (1982), Chen (1988), Spall (1988), 
Polyak and Tsybakov (1990), Dippon and Renz (1997), Spall (1997), Chen, Duncan and Pasik- 
Duncan (1999), Dippon (2003), and Mokkadem and Pelletier (2007a)). Stochastic approximation 
algorithms were also introduced by Revesz (1973, 1977) to estimate a regression function, and by 
Tsybakov (1990) to approximate the mode of a probability density. 

Let us recall Robbins-Monro's scheme to construct approximation algorithms of search of the 
zero z* of an unknown function h : R ^ R. First, Zq G M is arbitrarily chosen, and then the 
sequence (Z„) is recursively defined by setting 

Zn = Zn~l + ^-nWu 

where Wn is an "observation" of the function h at the point Z„_i, and where the stepsize (7„) is 
a sequence of positive real numbers that goes to zero. 

Let Xi,...,X„ be independent, identically distributed M'^-valued random vectors, and let / 
denote the probability density of X\. To construct a stochastic algorithm, which approximates 
the function / at a given point x, we define an algorithm of search of the zero of the function 
h : y >-^ f{x) — y. We thus proceed in the following way: (i) we set fQ{x) £ R; (ii) for all n > 1, we 
set 

fn{x) = fn-lix) + InWnix) 

where Wn{x) is an "observation" of the function h at the point /„_i(x). To define Wn{x), we follow 
the approach of Revesz (1973, 1977) and of Tsybakov (1990), and introduce a kernel K (that is, 
a function satisfying J^a K{x)dx = 1) and a bandwidth (/i„) (that is, a sequence of positive real 
numbers that goes to zero), and set Wn{x) = h~'^K{h~^[x — Xn]) — fn-i{x). The stochastic 
approximation algorithm we introduce to recursively estimate the density / at the point x can thus 



be written as 

/„(X) = (1 - 7n)/n-l(x) + InK^K ( ^—^ ] . (1) 



-d T^ I -^ ~ ^n 



Let (wn) be a positive sequence such that X]m„ = oo. When the stepsize (7^) is chosen equal to 
{'^n[J2k=i ^fc]~"^)) the estimator /„ defined by ([I]) can be rewritten as 

The class of estimators defined by the stochastic approximation algorithm ([1]) thus includes the 
general class of recursive estimators expressed as ^, and introduced in Hall and Patil (1994). In 
particular, the choice (wn) = 1 produces the estimator proposed by Wolverton and Wagner (1969), 
the choice {wn) = {hn ) yields the estimator considered by Wegman and Davies (1979), and the 
choice (wn) = {h^) gives the estimator considered by Deheuvels (1973) and Duflo (1997). 

The aim of this paper is the study of the properties of the recursive estimator defined by the 
stochastic approximation algorithm ([1]) , and its comparison with the wellknown nonrecursive kernel 
density estimator introduced by Rosenblatt (1956) (see also Parzen (1962)), and defined as 

f^i^^-^pi^)- (3) 



We first compute the bias and the variance of the recursive estimator /„ defined by ([T|). It 
turns out that they heavily depend on the choice of the stepsize (7n)- In particular, for a given 
bandwidth, there is a trade-off in the choice of (7„) between minimizing either the bias or the vari- 
ance of /„. To determine the optimal choice of stepsize, we consider two points of view: pointwise 
estimation and estimation by confidence intervals. 

From the pointwise estimation point of view, the criteria we consider to find the optimal stepsize 
is minimizing the mean squared error (MSE) or the integrated mean squared error (MISE). We 
display a set of stepsizes (7^) minimizing the MSE or the MISE of the estimator /„ defined by ([1]); 
we show in particular that the sequence (7^) = (^^^) belongs to this set. The recursive estimator 
introduced by Wolverton and Wagner (1969) thus belongs to the subclass of recursive kernel esti- 
mators which have a minimum MSE or MISE (thanks to an adequate choice of the bandwidth, see 
Section 12. 2p . Let us underline that these minimum MSE and MISE are larger than those obtained 
for Rosenblatt's nonrecursive estimator /„. Thus, for pointwise estimation and when rapid updat- 
ing is not such important, it is preferable to use Rosenblatt's estimator rather than any recursive 
estimator defined by the stochastic approximation algorithm ([1]). Let us also mention that Hall 
and Patil (1994) introduce a class of on-line estimators, constructed from the class of the recursive 
estimators defined in ([2]); their on-line estimators are not recursive any more, but updating them 
requires much less operations than updating Rosenblatt's estimator, and their MSE and MISE are 
smaller than those of the recursive estimators ([2]) . 

Let us now consider the estimation from confidence interval point of view. Hall (1992) shows 
that, to minimize the coverage error of probability density confidence intervals, avoiding bias esti- 
mation by a slight undersmoothing is more efficient than explicit bias correction. In the framework 



of undersmoothing, minimizing the MSE comes down to minimizing the variance. We thus display 
a set of stepsizes (7^) minimizing the variance of fn', we show in particular that, when the band- 
width (hn) varies regularly with exponent —a, the sequence (7^) = ([1 — ad]n~^) belongs to this 
set. Let us underline that the variance of the estimator /„ defined with this stepsize is smaller 
than that of Rosenblatt's estimator. Consequently, even in the case when the on-line aspect is not 
quite important, it is preferable to use recursive estimators to construct confidence intervals. The 
simulation results given in Section [3] are corroborating these theoritical results. 

To complete the study of the asymptotic properties of the recursive estimator /„, we give its 
pointwise strong convergence rate; we compare it with that of Rosenblatt's estimator /„ for which 
laws of the iterated logarithm were established by Hall (1981) in the case d = 1 and by Arcones 
(1997) in the multivariate framework. 

The remainder of the paper is organized as follows. In Section [21 we state our main results: 
the bias and variance of fn are given in Subsection 12.11 the pointwise estimation is considered in 
Subsection [221 the estimation by confidence intervals is developed in Subsection [231 and the strong 
convergence rate of fn is stated in Subsection 12.41 Section [3] is devoted to our simulation results, 
and Section [D to the proof of our theoritical results. 

2 Assumptions and main results 

We consider stepsizes and bandwidths, which belong to the following class of regularly varying 
sequences. 

Definition 1 Let 7 G M and (vn).„>i be a nonrandom positive sequence. We say that (vn) G GS (7) 



lim n 



1 _ '^"-1 

Vn 



7. (4) 



Condition ([!]) was introduced by Galambos and Seneta (1973) to define regularly varying sequences 
(see also Bojanic and Seneta (1973)), and by Mokkadem and Pelletier (2007a) in the context of 
stochastic approximation algorithms. Typical sequences in ^5(7) are, for 5 £ M, n'''(logn) , 
n"^ (log log n) , and so on. 

The assumptions to which we shall refer are the following. 

(Al) iT : M'^ ^ R is a continuous, bounded function satisfying f^^ K (z) dz = 1, and, for all 
j E {1, . . . d}, Jjj ZjK {z) dzj = and f^^i z'j\K (z) \dz < cxd. 

(A2) i) (7„) G gS{-a) withae]l/2,l]. 
a) (hn) € GS (—a) with a G ]0, a/d[. 
Hi) lim^^oo (^7n) G] min{2a, (1 — ad)/2}, 00]. 

(A3) / is bounded, twice differentiable, and, for all i,j £ {1, . . . d}, d'^f /dxidxj is bounded. 

Assumption (^2) Hi) on the limit of {n-jn) as n goes to infinity is usual in the framework of stochastic 
approximation algorithms. It implies in particular that the limit of ([?T-7n]~^) is finite. Throughout 



this paper we will use the following notation: 

C = lim (n7„,)"^ , (5) 



n— ^+00 



n] -- 


JRd 


z]K (z) 


dz, 


^(2) 
J ij 


ix) = 




ix). 



2.1 Bias and Variance 

Our first result is the following proposition, which gives the bias and the variance of /„. 

Proposition 1 (Bias and Variance of /„) Let Assumptions (^1) — (^3) hold, and assume that, 

(2) 

for all i,j G {1, . . .d}, /^ • is continuous at x. 
1. Ifa< a/{d + A), then 

1 '^ 

E (/„ (x)) -f{x) = ^ /.^ Yl (/^i/i?(^)) + '^ (^n) ■ (6) 

j=l 

Ifa> a/{d + 4), then 



^{fn{x))-f{x)=o[^^nKA. (7) 



(8) 



2. If a> a/(d + 4), then 

^- «■ W) - 23(1^1/ W I ^^ W "- ° (i) ■ 

If a < a/{d + A), then 

Var{fn{x))=o{hi). (9) 

3. Ifliuin^oo (iT'ln) > niax{2a, (1 — ad)/2}, then ([6]) and ^ hold simultaneously. 

The bias and the variance of the estimator /„ defined by the stochastic approximation algorithm 
([1]) thus heavily depends on the choice of the stepsize (7n)- Let us recall that the bias and variance 
of Rosenblatt's estimator /„ are given by: 

d 

2 



E (/n (X)) -f{x) = \hl Y, (/^I4f (^)) + O {hi) , (10) 

Vr (/„ (.)) = ^J (.) I K' (.) d. + ( J^) . (U) 

To illustrate the results given by Proposition [H we now give some examples of possible choices of 
{■jn), and compare the bias and variance of fn with those of /„. 



Example 1: Choices of (7„) minimizing the bias of /„ In view of Q, the asymptotic bias 
of fn (x) is minimum when ^^ = 0, that is, when (7^) is chosen such that hm„^oo ('^7n) = 00, and 
we then have 



1 '^ 
E[U{x)] - fix) = -hlY, {l^]ff^{x))+o{hl) . 



In view of (jlOp . the order of the bias of the recursive estimator /„ is thus always greater or 
equal to that of Rosenblatt's estimator. Let us also mention that choosing the stepsize such that 
lim^^oo J^Tn = 00 (in which case the bias of fn is equivalent to that of Rosenblatt's estimator) is 
absolutely unadvised since we then have 



Var fn (x) 
lim ^^ ^ = 0. 

n^oo Var {f nix)) 

Example 2: Choices of (7„) minimizing the variance of /„ As mentioned in the introduc- 
tion, it is advised to minimize the variance of /„ for interval estimation. 

Corollary 1 Let the assumptions of Proposition{l\hold with fix) > 0. To minimize the asymptotic 
variance of fn, a must be chosen equal to 1, (7^) must satisfy lim.n^oon'jn = 1 — ad, and we then 
have 



Var[fnix)]=^—^fix)[ K'iz)dz + o(^) 
nhi Jf,d \nhlJ 



It follows from Corollary [T] and (jll|) that, thanks to an adequate choice of (7^), the variance 
of the recursive estimator fn can be smaller than that of Rosenblatt's estimator. To see better 
the comparison with Rosenblatt's estimator, let us set ihn) £ QSi—l/\d + 4]) (which is the choice 
leading in particular to the minimum mean squared error of Rosenblatt's estimator). When (7^) 
is chosen in ^5(— 1) and such that lim„^oo n'^n = 1 — d/[(i + 4], we have 

^{fnix))-fix) 1 Var Unix)) ^ + 4 

n^ooEifnix))-fix) 2' n^cx. Far (/„ (x)) 4 " ^ ^ 

It is interesting to note that, whatever the dimension d is, the bias of the recursive estimator /„ 
is equivalent to twice that of Rosenblatt's estimator, whereas the ratio of the variances goes to 
infinity as the dimension d increases. 

To conclude this example, let us mention that the most simple stepsize satisfying the conditions 
required in Corollary [T] is (7„,) = ([1 — ad]n~^). 

Example 3: The class of recursive estimators introduced by Hall and Patil (1994) The 

following lemma ensures that Proposition [1] gives the bias and variance of the recursive estimators 
defined in ([2]) and introduced by Hall and Patil (1994) for a large choice of weights (wn). 

Lemma 1 Set iwn) G QSiw*) and (7„) = iwn['^k=i'^k\~^)- If w* > —1, then (7„) G QSi—l) and 
hm^^oo njn = l + w*. 



Set (hn) G GS{—a); we give explicitly here the bias and variance of three particular recursive 
estimators. 

• When (wn) = 1, fn is the estimator introduced by Wolverton and Wagner (1969); in view of 
Lemma [H Proposition [1] applies with ^ = 1, and we have 

E {fn (x)) -f{x) = ^J_^. hl J2 (/^,VJf (^)) + o [hi) , 



y^^ ifn (^)) = V^.Z^J (^) £ K' (^) dz + o (^) 



1 + ad nM ' 



• When (wn) = {h-d ), fn is the estimator considered by Wegman and Davies (1979); in view 
of Lemma dl Proposition [1] applies with .^ = (1 — ad/2)~^, and we have 

E (/„ (x)) - / (X) = ^^^IZ^fcJ ^ (M?/i?(x)) + o {hi) , 
Va. (/„ (.» = ^^dg/ (^O X, A- (.) ,. + o (I) ^ 

• When (wn) = (h^), fn is the estimator introduced by Deheuvels (1973) and whose convergence 
rate was established by Duflo (1997); in view of Lemma [H Proposition [1] applies with S, = 
(1 — ad)~^, and we have 

E iu (X)) - / (.) = ^ [-/"^ hi x: (^i/ff (X)) + o (hi) , 

^ar (/„ (x)) = i-^/ (x) £ i^2 (^) d. + o(j^y 

Let us underline that the bias and variance of this estimator are equivalent to those of the 
estimator defined with the stepsize (7n = ([1 — ad]n~^) (this choice minimizing the variance 
of /„, see Corollary [1]) , but its updating is less straightforward. 

2.2 Choice of the optimal stepsize for point estimation 

We first explicit the choices of (7^) and (hn), which minimize the MSE and MISE of the recursive 
estimator defined by the stochastic approximation algorithm ([1]), and then provide a comparison 
with Rosenblatt's estimator. 

2.2.1 Choices of (7„) minimizing the MSE of /„ 

Corollary 2 Let Assumptions (Al) - {A3) hold, assume that f{x) > 0, ^^1=1 (/^j/jf (^)) t^ ^' 
and that, for all i,j G {1, • • • d}, f^^ is continuous at x. To minimize the MSE of fn at the point 



X, the stepsize (7„) must be chosen in QS {—1) and such that lim„_^oo '^Tn = 1? the bandwidth (/i^ 
must equal 



[ 



d{d + 2) / {x) J^^ K'^ {z) dz 



1 

d+4, 



\ 



1 
, d+4 



and we then have 



MSE = n d+4 



3d+8 
{d + 4) d+4 



d d+6 2d+4 

dd+4 4d+4(d+ 2) d+i 



Ez-l/jfi.) 



i=i 



2d 
d+4 



/ {x) / K^ {z) dz 



4 

d+4 



[! + «(!)] 



The most simple example of stepsize belonging to QS{—1) and such that lim„_>oo ?^7n = 1 is 
{in) = ("'~^)- Foi' this choice of stepsize, the estimator /„ defined by ([1]) equals the recursive kernel 
estimator introduced by Wolverton and Wagner (1969). This lattest estimator thus belongs to the 
subclass of recursive kernel estimators, which, thanks to an adequate choice of the bandwidth, have 
a minimum MSE. 



2.2.2 Choices of (7„) minimizing the MISE of /„ 

The following proposition gives the MISE of the estimator /„. 



Proposition 2 Let Assumptions {Al) — {A2>) hold, and assume that, for all i,j £ {1, . . . d}, f 
is continuous and integrable. 



(2) 



1. If a < a/ {d + 4), then 



MISE 



2. If a = a/(d + 4), then 



MISE 



1 



4(1- 2aO 



2 '"n 






dx -\- a (/i„) . 



^ 2^n/ 

4(l-2ae)^ h'i 




-(""^rj^ 





dx + 



In 



2-{l-ad)i hi 



3. If a> a/{d + A), then 

MISE 



/ ^:^t K\z)dz + o(:^\ 
[l-ad)ihihd "■ ' \hi) 



K^ {z) dz 



The following corollary ensures that Wolwerton and Wagner's estimator also belongs to the 
subclass of kernel estimators defined by the stochastic approximation algorithm ([T|), which, thanks 
to an adequate choice of the bandwidth, have a minimum MISE. 



(2) 

Corollary 3 Let Assumptions (Al) — {A3) hold, and assume that, for all i,j E {l,...d}, flj 
is continuous and integrahle. To minimize the MISE of fn, the stepsize {'jn) must be chosen in 
QS {—1) and such that lim^^oo ?^7n = 1; the bandwidth [hn) must equal 



and we then have 



( 



\ 



d{d + 2) 



lud K^ (z) dz 



,^<'' + ^'/..(E,t,M|4?W)'^, 



d+4 \ 



In 



MISE = n'd+i 



^ -Jd+S, 

{d + 4) d+4 



d d+6 2d+4 

d<i+iA'i+-i{d+ 2) d+4 



E''l/i?(.) 



dx 



d+4 



K"^ {z) dz 



4 

d+4 



[1 + 0(1) 



2.2.3 Comparison with Rosenblatt's estimator 

The ratio of the optimal MSE (or MISE) of Rosenblatt's estimator to that of Wolwerton and 
Wagner's estimator equals 

^^' ~ [ (d + 4)2rf+4 _ ■ 

This ratio is always less than one, it at first decreases, and then increases to one as the dimension 
d increases. This phenomenon is similar to that observed by Hall and Patil (1994). The former 
authors consider the univariate framework, but look at the efficiency of Wolwerton and Wagner's 
estimator of the sth-order derivative of / relative to Rosenblatt's one; the ratio p{s) varies in s 
in the same way as p{d) does in d. According to pointwise estimation point of view, and when 
rapid updating is not too important, it is thus preferable to use Rosenblatt's nonrecursive estimator 
rather than any recursive estimator defined by the stochastic approximation algorithm ([T]). Let us 
mention that Hall and Patil (1994) introduce a class of on-line estimators, constructed from the 
class of the recursive estimators defined in ([2]); their on-line estimators are not recursive any more, 
but updating them requires much less operations than updating Rosenblatt's estimator, and their 
MSE and MISE are smaller than those of the recursive estimators ([2]) . 



2.3 Choice of the optimal stepsize for interval estimation 

Let us first state the following theorem, which gives the weak convergence rate of the estimator /„ 
defined in ([T]). 

Theorem 1 (Weak pointwise convergence rate) Let Assumptions (Al) — {A3) hold, assume 

(2) 

that f{x) > and that, for all i,j G {1, . . . d}, f>; is continuous at x. 
1. If there exists c > such that ^n^h'^'^ — > c, then 

^/ln'hi{fn{x)-f{x)) 



V 



M 



1 

C2 



2(l-2a0^t^ 



,2/(2), 



E"?/]? 



'2-(l-ad)r 



f{x) / K'{z)dz 



2. If -f-^h^+^ ^oo, then 

T) IP 

where -^ denotes the convergence in distribution, M the Gaussian- distribution and -^ the conver- 
gence in probability. 

As mentioned in the introduction, Hall (1992) shows that, to minimize the coverage error of 
probability density confidence intervals, avoiding bias estimation by a slight undersmoothing is 
more efficient than bias correction. Let us recall that, when the bandwidth (/i„) is chosen such 
that lim„^oo J^/in ^ — (which corresponds to undersmoothing), Rosenblatt's estimator fulfills the 
central limit theorem 

\f^n{fn{x)-f{x)) ^ m(^J{x) j^^K\z)dzy (13) 

Now, let $ denote the distribution function of the AA(0, 1), let tQ,/2 t>e such that ^{tci/2) = 1 ~ ct/2 
(where a e]0, 1[), and set 



hSx) 



I ^ . ^/ N 9n jx) J^a K^ [z) dz gn jx) J^a K^ jz) dz 

9n [x] - ta/2C [gn) \ —^ , gn {x) + ta/2C (fif„) \ —^ 



In view of (jlSh . the asymptotic level of 1 7 {x) equals 1 — a for C{fn) = 1- The following corollary 
gives the values of C{fn) for which the asymptotic level of If„{x) equals 1 — a too. 

Corollary 4 Let the assumptions of Theorem\^hold with lim„_,oo ^7n = 7o G]0, oo[ and lim„^oo nh'^^ 
0. The asymptotic level of If^{x) equals 1 — a for 



C{fn) = ^7o[2-(l-arf)7o-^] 



11-1 



Moreover, the minimum of C{fn) is reached at 'Jq = 1 — ad and equals \/l — ad. 

The optimal stepsizes for interval estimation are thus the sequences (7^) G GS{—1) such that 
lim-n-^oo^Tn = 1 — ad, the most simple one being (7^) = ([1 — ad]n~^). Of course, these stepsizes 
are those which minimize the variance of /„ (see Corollary [1]) . 

2.4 Strong pointwise convergence rate 

The following theorem gives the strong pointwise convergence rate of /„. 

Theorem 2 (Strong pointwise convergence rate) Let Assumptions (Al) — {A3) hold, and as- 
sume that, for all i,j £ {1, . . . d}, f^^ is continuous at x. 



1. If there exists ci > such that 7„ ^^^ */ (li^[Z]fc=i 7fc]) ~^ ^i, then, with probability one, the 
sequence 



In^K 



iU(x)-f{x)) 



21n[ELi7d 

is relatively compact and its limit set is the interval 

d 

-1 a' 

2( 

d 

2(1- 2aO V 2 ^ V^^-"" '""V ' y 2 - (1 - ad) i 






2-{l-ad)C 



K^ {z) dz, 



fi^) 



K^ (z) dz 



2. -(/7„ ^^J^"*"^/ (ln[E^^]^ 7fc]) — > oo, then, with probability one, 



1 1 '^ 

lim — (/„ (x) -f(x)) = — y f u?/if C 



Set (hn) such that hm^^oo n/i^+^/lnlnn = 0. Arcones (1997) proves the fonowing com- 
pact law of the iterated logarithm for Rosenblatt's estimator: with probabihty one, the sequence 
{y nh'^{fn{x) — /(x))/V21nlnn) is relatively compact and its limit set is the interval 



J 



fix) / K^z)dz,. f{x) / K^z)dz 



Now, set (7„) such that lim„^oo 'T-Tn = To G]0,oo[. The first part of Theorem [2] ensures that, with 
probability one, the limit set of the sequence {\/nh^{fn{x) — f{x))/V2lnlnn) is the interval 



^(7o) 



-A{jo)Jfix) f K^z)dz,A{jo)Jfix) f K^z)dz 



with ^(70) 



70 



[2 - (1 - ad)7o-'] 



In particular, for Wolwerton and Wagner's estimator, ^(70) = l/\/l + ad; for the estimator con- 
sidered by Wegman and Davies (1979), or when (7^) = ([1 — ad/2]n~^), ^(70) = 1 — ad/2; for 
the estimator considered by Deheuvels (1973) and Duflo (1997), or when (7^) = ([1 — ad]n~^), 
^(70) = \/l — ad. For all these recusive estimators, the length of the limit interval J(7o) is smaller 
than that of J, which shows that they are more concentrated around / than Rosenblatt's estimator 
is. 



3 Simulations 

The aim of our simulation studies is to compare the performance of Rosenblatt's estimator defined 
in dSl) with that of the recursive estimators, from confidence interval point of view. Of course, the 



10 



recursive estimator we consider here is the optimal one according to this criteria (see Corollary S]) . 
We set: 



J-i,n 



gn (x) - 1.96 Cign)d ''''^ ' ^^^^, ^ ^ , Qn ix) + 1.96 C(gn)W ^^ ^^^^, ^ ^ 



where: 

• if i = 1, then gn = fn is Rosenblatt's estimator, and C{gn) = 1; 

• if f = 2, then g^ = fn is the optimal recursive estimator defined by the algorithm ([1]) with 
the stepsize (7„) = ([1 — a(i]n~^), and C{gn) = Vl — cid. 

According to the theoritical results given in Section [231 both confidence intervals Ii^n and /2,n have 
the same asymptotic level (equal to 95%), whereas /2,n lias a smaller length than /i^„. In order to 
investigate their finite sample behaviours, we consider three sample sizes: n = 50, n = 100, and 
n = 200. In each case, the number of simulations is A^ = 5000. Tables 1-4 give (for different values 
of d, /, X, and (hn)): 

• the empirical levels ^ {/ (x) G li.n} /N at each first line concerning /j^„. 

• the averaged lengths of the intervals li^n at each second line concerning /j^„. 

The case d = 1. In the univariate framework, we consider two densities /: the standard normal 
AA(0, 1) distribution (see Table 1), and the normal mixture ^M{—\, 1) + \N{\, 1) distribution (see 
Table 2). The points at which / is estimated are: x = 0, x = 0.5, and x = 1. The bandwidth {hn) 
is set equal to (n^") with a = 0.21 and a = 0.23 (the parameter a being chosen slightly larger than 
1/5 to slightly undersmooth) . Both tables show that the recursive estimator performs better than 
Rosenblatt's one: the empirical levels of the intervals /2,n are greater than those of /i^n, whereas 
their averaged lengths are smaller. 

The case d = 2. In the case when d = 2, we estimate the density / of the random vector X 

defined as X = AY with ^4 = ( ^ _ ^ | , and where the distribution of the random vector Y is: 

V0.5 V 

• the normal standard distribution M {0,l2) (see Table 3); 

• the normal mixture ^M {-B, h) + \M {B, h) with B = i ' j (see Table 4). 

The points at which / is estimated are: x = (0,0), x = (0.5,0.5), and x = (1,1). The bandwidth 
{hn) is set equal to (n~"). To slightly undersmooth, the parameter a must be chosen slightly larger 
than 1/6; we first chose a = 0.17 and a = 0.19. Tables 3 and 4 show that, for these given values of 
the parameter a, the recursive estimator performs better for the sample sizes n = 50 and n = 100, 
whereas, at first glance, Rosenblatt's estimator performs better in the case when n = 200. This is 
explained by the fact that, for this lattest sample size, the length of /2,n becomes too small. We 
have thus added other choices of the parameter a {a = 0.21 in Table 3; a = 0.21 and a = 0.24 in 
Table 4). The larger a is, the larger the length of the intervals Ij.„ are, and the larger the empirical 
levels are. Now, Tables 3 and 4 also show that, for the sample size n = 200, the intervals /2,n 
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Table I: X -^ AA(0, 1) 







x = G 






x = 0.5 






x = 1 






n = 50 


n = 100 


n = 200 


n = 50 


n = 100 


n = 200 


n = 50 


n= 100 


n = 200 












a = 0.21 










h,n 


96.74% 


96.08% 


95.74% 


97.1% 


96.74% 


96.96% 


97.72% 


97.44% 


97.7% 




0.2681 


0.2061 


0.158 


0.2538 


0.1948 


0.1493 


0.2168 


0.165 


0.126 


h,n 


99.36% 


98% 


96.18% 


99.76% 


98.96% 


98.36% 


98.86% 


98.76% 


98.78% 




0.2436 


0.184 


0.140 


0.2332 


0.1755 


0.1331 


0.2068 


0.1529 


0.1146 












a = 0.23 










h,n 


96.58% 


96.46% 


96.78% 


96.78% 


97.06% 


97.04% 


97.32% 


97.58% 


96.96% 




0.2796 


0.2167 


0.1674 


0.2653 


0.205 


0.1579 


0.225 


0.1731 


0.1328 


h,n 


99.46% 


98.58% 


97.58% 


99.6% 


99.26% 


98.72% 


98.68% 


98.32% 


97.96% 




0.2517 


0.1915 


0.1467 


0.2415 


0.1828 


0.1393 


0.2134 


0.159 


0.1197 



computed with a = 0.21 or a = 0.24 have a smaller length and a higher level than the intervals 
Ii^n computed with o = 0.17 or a = 0.19, so that we can say again that the recursive estimator 
performs better than Rosenblatt's one. 

This simulation study shows the good performance of the recursive estimator defined by the 
algorithm ([1]) with the stepsize (7,1) = ([1 — ad\n~^^ for interval estimation. The main question 
which remains open is how to choose the bandwidth (/i„) in QS{—a), and, in particular, how to 
determine the parameter a. This problem is not particular to the framework of recursive estimation; 
in the case when Rosenblatt's estimator is used, Hall (1992) enlightens that criteria to determine 
the "good undersmoothing" are not easy to determine empirically. 
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Table 2: X -^ W{-h, 1) + Wih 1 







x = Q 






x = 0.5 






x = 1 






n = 50 


n= 100 


n = 200 


n = 50 


n = 100 


n = 200 


n = 50 


n = 100 


n = 200 












a = 0.21 










h,n 


96.86% 


96.96% 


96.86% 


96.96% 


96.68% 


96.8% 


97.12% 


97.04% 


96.94% 




0.2541 


0.1949 


0.1493 


0.2436 


0.1866 


0.1427 


0.2142 


0.1642 


0.1251 


h.n 


99.76% 


99.04% 


98.2% 


99.62% 


99.28% 


98.72% 


99.14% 


98.94% 


98.4% 




0.2334 


0.1755 


0.1331 


0.2257 


0.1692 


0.1278 


0.2045 


0.1518 


0.1136 












a = 0.23 










h,n 


96.92% 


97.04% 


96.84% 


96.56% 


96.66% 


97.14% 


97.02% 


97.12% 


96.76% 




0.2654 


0.2049 


0.1579 


0.254 


0.196 


0.151 


0.2233 


0.1717 


0.1321 


h,n 


99.9% 


99.18% 


98.76% 


99.74% 


99.3% 


98.92% 


98.78% 


98.76% 


98.2% 




0.2416 


0.1826 


0.1393 


0.2334 


0.176 


0.1338 


0.2116 


0.1575 


0.1187 



Table 3: X = AY with Y -^ M {0, h) 







x = (0,0) 






x = (0.5,0.5) 






x = (l,l) 






n = 50 


n = 100 


n = 200 


n = 50 


n = 100 


n = 200 


n = 50 


n = 100 


n = 200 












a = 0.17 










h,n 


93.82% 


94.98% 


96.9% 


91.06% 


92.82% 


94.0% 


89.48% 


86.88% 


85.82% 




0.1159 


0.0934 


0.0757 


0.1059 


0.0854 


0.0686 


0.0811 


0.0645 


0.0515 


h,n 


97.54% 


95.12% 


94.34% 


96.74% 


94.62% 


92.86% 


97.2% 


94.32% 


91.16% 




0.0979 


0.0765 


0.061 


0.091 


0.0707 


0.0558 


0.0736 


0.0557 


0.0432 












a = 0.19 










h,n 


95.64% 


97.08% 


97.28^ 


93.46% 


94.84% 


95.82% 


91.58% 


91.06% 


89.04% 




0.1271 


0.1042 


im 


0.1158 


0.0946 


0.077 


0.0883 


0.0713 


0.0574 


h,n 


97.5% 


97.26% 


96.64% 


97.22% 


96.5% 


95.42% 


96.74% 


95.66% 


92.24% 




0.1045 


0.0829 


0.0666 


0.0969 


0.0763 


0.0609 


0.0783 


0.0599 


0.0469 












a = 0.21 










h,n 


96.68% 


97.62% 


98.24% 


95.16% 


96.48% 


97.16% 


92.76% 


91.2% 


91.04% 




0.1392 


0.1157 


0.0957 


0.1267 


0.105 


0.0863 


0.0962 


0.0783 


0.0641 


h,n 


97.16% 


97.48% 


97.56%^ 


96.96% 


96.84% 


96.7% 


96.72% 


96.58% 


94.2% 




0.1111 


0.0893 


0.0726 


0.1031 


0.0822 


0.0662 


0.0832 


0.0642 


0.0509 
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Table 4: X = AY with Y -^ \N {-B, h) + \N {B, h) 







x = (0,0) 






x = (0.5,0.5 


) 




x = (l,l) 






n = 50 


n= 100 


n = 200 


n = 50 


n = 100 


n = 200 


n = 50 


n = 100 


n = 200 












a = 0.17 










h,n 


91.84% 


91.28% 


92.4% 


90.06% 


89.42% 


87.86% 


83.24% 


80.46% 


78.88% 




0.105 


0.0847 


0.068 


0.0976 


0.0785 


0.063 


0.0787 


0.0631 


0.050 


h,n 


96.8% 


93.76% 


91.34% 


95.9% 


92.32% 


86.96% 


95.52% 


87.6% 


82.12% 




0.0903 


0.0702 


0.0553 


0.0851 


0.0657 


0.0516 


0.0716 


0.0544 


0.0419 












a = 0.19 










h,n 


93.54% 


93.94% 


^a!^^^ 


90.72% 


91.38% 


^zfl^^J 


85.46% 


84.24% 


82.24% 




0.1151 


0.094 


MM 


0.1158 


0.1069 


0.070^ 


0.0857 


0.0692 


0.0457 


h,n 


97.42% 


95.92% 


94.38% 


97.22% 


97.06% 


91.74% 


96.18% 


91.26% 


86.88% 




0.0964 


0.0757 


0.0604 


0.0969 


0.0908 


0.0562 


0.0762 


0.0582 


0.0469 












a = 0.21 










h,n 


94.82% 


96.12% 


97.44% 


93.14% 


93.46% 


94.16% 


88.72% 


86.24% 


83.54% 




0.1259 


0.1037 


0.0858 


0.1163 


0.0962 


0.0793 


0.0935 


0.0764 


0.0624 


h,n 


97.1% 


97.48% 


96.96% 


96.82% 


96.04% 


93.96% 


96.76% 


93.52% 


88.24% 




0.1025 


0.0813 


0.0659 


0.0963 


0.0762 


0.0613 


0.0811 


0.0627 


0.0495 












a = 0.24 










h,n 


96.26% 


97.48% 


98.38% 


94.36% 


96.16% 


96.7% 


91.04% 


91.08% 


89.42% 




0.1435 


0.1208 


0.1017 


0.1325 


0.1117 


0.0937 


0.1058 


0.0885 


0.0736 


h,n 


96.18% 


97.54% 


98.049| 


96.68% 


97.38% 


96.693 


96.98% 


95.96% 


91.3% 




0.1117 


0.0903 


MHH 


0.1049 


0.0845 


ami 


0.0883 


0.0695 


0.0558 
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4 Proofs 

Throughout this section we use the fohowing notation: 

n 



X^Tfc' 



fc=l 

Let us first state the following technical lemma. 

Lemma 2 Let (vn) £ QS{v*), (7^) G GS{—a), and m > such that m — v*S, > where S, is 
defined in ([5]). We have 

n 

lim Vn'KYn-'^— = . (15) 

Moreover, for all positive sequence (a„) such that lim„^_|_oo O-n = 0, and all 6 GM, 

0. (16) 



lim ^„n™ 

n—f+oc 



n 






Lemma [2] is widely applied throughout the proofs. Let us underline that it is its application, 
which requires Assumption {A2)iii) on the limit of (njn) as n goes to infinity. Let us mention 
that, in particular, to prove ([8]), Lemma [2] is applied with m = 2 and (?;„) = {"fn^^n) (and thus 
V* = a — ad); the stepsize (7^) must thus fulfill the condition lim„^oo i'lT'ln) > {a — ad) /2. Now, 
since lim„_^oo (^^Tn) < 00 only if a = 1, the condition lim„_^oo ("-7n) G]min{2a, (1 — ad)/2},oo] 
in {A2)iii) is equivalent to the condition lim„^oo (^Tn) £] min{2a, (a — ad)/2}, 00], which appears 
throughout our proofs. Similarly, since ^ 7^ only if a = 1, the limit [2 — (a — ad) ^]~^ given by 
the application of Lemma [2] for such m and (f„) equals the factor [2 — (1 — ad) ^]"^ that stands in 
the statement of our main results. 

Our proofs are now organized as follows. Lemmas [1] and [2] are proved in Section HTTl Propositions 
[Hand [2] in Sections 14.21 and [4.31 respectively. Theorems [U and [2] in Sections 14.41 and [4.51 respectively. 
and Corollaries [T][3| in Section [4.61 

4.1 Proof of Lemmas [1] and [2] 

We first prove Lemma[TJ Since (wn) G GS (w*) with w* > —1, we have 

lim -^^^^1^ = 1 + ^*, (17) 

which guarantees that lim„^oo^7n = 1 + w*. Moreover, applying (J17p . we note that 

Efc=Nfc _ . _ ^n _ . _ 1 + ^^* , ^ fl\ 

k=i^k L.k=i^k n \nj 

15 



so that 



lira n 



En—l 
Lfc=l Wk 



l+W*. 



It follows that (X]fc=i ^fc) ^ ^'5 (1 + ^*)i ^'^cl thus that (7.„) S ^5 (—1), which concludes the proof 
of Lemma [TJ 

To prove Lemma [21 we first establish (fT6|) . Set 



We have 



n;n — ^nJ-ij^ 



n;n 



Enr7fc^fc'«;t + '^ 



.fc=l 



fn.-l 



(1 -7„)™<5n-l +7n"n 



with, since (f„) S ^5 (u*) and in view of ([5|), 



"■ /-I \r 
(1 - 7n) 



■y„-l 



1 + — + o ( - ) ) (1 - m7„ + o (7„)) 
n \n 



(18) 



= (l + u*C7„ + o(7„))(l- m7„ + o(7„)) 
= 1 - (m - u*^) 7„ + o (7„) . 

Set A € ]0, m — f *^[; for n large enough, we obtain 

Qn < (1 - ^7n) Qn-l + 7nan 

and (J16p follows straightforwardly from the application of Lemma 4. LI in Duflo (1996). Now, let 
C denote a positive generic constant that may vary from line to line; we have 



v^n:^Y.U-"^jkVk'-{m-v*C)-' = VnU^ 

k=l 

with, in view of (llSp . 

Pn 



Y^U-^^kV^'-{m-v*0~'Pn 



,fc=l 



k=2 


- Vk-i^k-i) + c 


fc=2 




n 





+ c 

= E^fc"X"'"[("i-^*C)7fc + o(7fc)] + C. 

fc=2 

It follows that 

n 

z;„n- ^ li-^lkv^^ - (m - v*ir^ = VnK 



k=l 



E^k'^^k'o{lk)+C 



.k=l 



and psp follows from the application of ()16p , which concludes the proof of Lemma [2l 



16 



4.2 Proof of Proposition [T] 

In view of ([T]) and (jl4|) . we have 

fn (x) - f (x) 

= (1 - 7„) (/„_i (x) - / (x)) + 7n (^n (X) -fix)) 
n-1 



E 

fc=i 



n (1-7.: 



j=fc+l 



7fc (^fc (X) - / (X)) + 7n (Zn (x) - / (x)) + 



Un Y^ n-Sfc (Zfc (x) - / (x)) + n„ (/o (x) - / (x)) 



n 

il(l-7,) 


(/o(x)-/(x)) 




(19) 



fc=l 
It follows that 



E(/n(x))-/(x) = n„^n-Sfc(E(^fc(x))-/(x)) + n„(/o(x)-/(x)). 



fc=i 



Taylor's expansion with integral remainder ensures that 

E [Zfe (x)] - / (x) = / K{z)[f{x-zhk)-f{x)]dz 



.2 d 



rE /^Mi(^) +^i'^'^(-) 



(20) 



j=i 



with 



•^fc (x) = Y^ 



l<i,j<d 



R'i Jo 



{l-s)ziZ,K{z) fl^{x-zhks)-f\\x) 



dsdz, 



f(2) 



and, since /^ • is bounded and continuous at x for alH, j G {1, . . . , d}, we have linifc^oo ^k (x) = 0. 



In the case a < a/{d + 4), we have lim^^oo (^7n) > 2a; the application of Lemma [2] then gives 

-t d n 

iE[/n(x)]-/(x) = -j;(^2^jf(x))n„J]n-Sfc/.i[i + o(i)] + n„(/o(x)-/(x)) 

j=i fc=i 

and ([6|) follows. In the case a > a/((i + 4), we have /i^ = o ( v 7n^n ) ; since lim„_^oo ("-7n) > 
(a — ad) /2, Lemma [2] then ensures that 



E[/n(x)] 



/(x) = n„f]n-Sfco(y^I^]+o(n„) 
fc=i ^ ^ 



O yinhn 
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which gives ([7|). Now, we have 

n 

Var[fn{x)] = IilY,'^fllVar[Zk{x)\ 



fc=i 

" TT-2 9 



k=l "-k 



n^E 



with 



ud 
k=l "-k 



Vk{x) 

h{x) 



K^{z)f{x-zhk)dz-hi{ / K{z)f{x-zhk)dz 
fix) [ K^{z)dz + Vk{x)-hiDk{x) 



K^ (z) [f {x - zhk) - f (x)] dz, 



K (z) f {x — zhk) dz 



Since / is bounded and continuous, we have hm^^oo ^k {x) = and hnifc_>oo ^^i^fc {x) = 0. In the 
case a > a/{d + 4), we have hm„^oo (^7n) > {a — ad) /2, and the apphcation of Lemma [2] gives 



Var[U{x)] = lilY, 



" TT-2„,2 



^klt 



.^1 K 



2-{a-ad)C K 



f{x) I K^{z)dz + o{l) 
^" fix) I K\z)dz + o{l) 



which proves ^. In the case a < a/{d + 4), we have Jnhn'^ = o (h^); since hm„^oo (nju) > 2a, 
Lemma [2] then ensures that 

n 

Var[fn{x)] = UlY^^f^koiht) 

k=l 



o {hi) , 



which gives ([9|). 

4.3 Proof of Proposition [2] 

Let us first note that, in view of ([20]). we have 



/ Jn„Vn^S[iE(^fc(x))-/(a;)]i dx 
J^" I k=i J 



E/^?4f( 



i=i 



dx 



nnEn^Sfc/^fc 



fc=i 



+ 



fc=i 






E^?4? 



2.(2) 



^n E ^fc ^^khl6k (a 
4 (x) dx . 



dx 



i=i 
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(2) 

Since /^ • is continuous, bounded, and integrable for all i,j S {1, . . . d}, the application of Lebesgue's 

convergence theorem ensures that hmfc-»+oo f^d §1 (x) dx = and hmfc_»+oo fud[J2j=i l^jfjj (^)^^k (x) dx 
0. Moreover, Jensen's inequality gives 



Un'^U^^-fkhlSk {a 



k=l 






k=l 



k=l 



SO that we get 



/ |n„f;n-SUE(^fc(x))-/(x)]l 



dx 



If 






dx 



HnJ^n^Sfc/ii 



A:=l 



+ 



IlnY^U^'lkhl 



k=l 



n„5^n^Sfco(/ii) 



fc=i 



• Let us first consider the case a < a/((i + 4). In this case, lim„_>oo (^Tn) > 2a, and the application 
of Lemma [2] gives 



/ |n„X;n^SUiE(^fc(x))-/(x)]l dx 



4(l-2aO 



2 '"n 






dx + o (/i^) , 



and ensures that H^ = o{h1^). In view of p9]) . we then deduce that 



{¥.{fn{x))-f{x)fdx 



1 



4(l-2aO 



hi 



2"'n 






dx + o (/i^) . (21) 



• Let us now consider the case a > a/{d + A). In this case, we have h"^ = o( -1/7^/1^ ) and 
lim^^oo (iT'ln) > {a — ad) /2. The application of Lemma [2] then gives 

^ |n„f;n-S,,[E(Zfc(x))-/(x)]l dx = oi n„f;n,So(y^Iv') 

and ensures that 11^ = o{'^nhn'^). In view of (fT9|) . we then deduce that 

/ {E (/„ (x)) - / (x)}2 dx = O (^nK") . (22) 
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On the other hand, we note that 



Var [fn {x)] dx 

n .. 

-- UlY^^fjl / Var[Zk{x)]dx 



k=l 
n 



^lE^fll 



k=l 



K 



K {z) f {x — zhk) dzdx 



K {z) f {x — zhk) dz dx 



with 



K^ (z) f {x — zhk) dzdx = / ^'^ i^) { / f {x — zhk) dx ] dz 

K"^ (z) dz 



and 



K (z) f {x — zhk) dz \ dx = K (z) K (^z) f {x — zhk) f [x — z' hk) dzdz' dx 



< 



ooll-"- 111- 



In the case a > a/{d + 4), we have hm^^oo ("-7n) > (a — ad) /2, and Lemma [2] ensures that 



/ Var[Uix)]dx = UlY, ,, 
J^" k=i ^k 



I K"^ {z) dz + o{l) 

;rT^ [ K^{z)dz + o(^] . (23) 

hi{2-{a-ad)0jR'i ^ ' WnJ ^ ' 



% 



1 



• In the case a < a/{d + 4), we have '^nh^'^ = o{hf^) and Um„_+oo (^7n) > 2a, so that Lemma [2] 
gives 



„ n 

/ Var[U{x)]dx = UlY,^f^ko{ht) 
= o{hi). 



(24) 



Part 1 of Proposition [2] follows from the combination of ()2ip and (|24p . Part 2 from that of ([21 
and (123D, and Part 3 from that of (l22|) and (l23|). 



4.4 Proof of Theorem [T] 

Let us at first assume that, if a > a/{d + 4), then 



y^^ (/„ (re) - E [/„ (x)]) ^ AA (o 



1 



2- {a-ad)i- 



f (x) / K^ (z) dz 



(25) 
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In the case when a > a/((i + 4), Part 1 of Theorem [T] follows from the combination of ([7]) and ([25 
In the case when a = a/{d + 4), Parts 1 and 2 of Theorem [1] follow from the combination of ([I 
and ([25|) . In the case a < a/{d + A), Q implies that 

and the application of ([6]) gives Part 2 of Theorem [H 
We now prove (j25p . In view of ([1]), we have 

fn{x)-E[fn{x)] = {l--fn){fn-l{x)-K[fn^i{x)])+-fn{Zn{x)-E[Zn{x)]) 

n 

= Un^U-'^k{Zk{x)-E[Zk{x)]). 



k=l 



Set 



Yk (x) = n^ Sfe {Zk {x) - E {Zk (x))) . 
The application of Lemma [2] ensures that 

n 

vl = ^Var{Yk{x)) 

k=l 
n 

= Y.nfjlVar{Zk{x)) 

^--—J{x) j K^{z)dz + o{l) 
- ad) 4 J^d 



(26) 



fc=i 

ud 

fc=l '^fe 

J_7n 



2- (a 
On the other hand, we have, for all p > 0, 



(27) 



E 



\Zk{x, 



|2+P 



O 



1 



K 



d(i+p) 

k 



(28) 



and, since lim„_>oo (?^7n) > (a — ad) /2, there exists p > such that lim^^oo (w7n) > 2+f ('^ ~ ^'^)- 
Applying Lemma O we get 



n 



Yt(xf*' 



k=l 



\fc=l 



\Zt (1)1'+' 



n — z — jy z- 

fc=i "fc 



O 



( ll' 






-P) 
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and we thus obtain 



1 " 

Vn u—-\ 



o 



fe=l 



7n/ln' 



p/2 



o(l). 



The convergence in (j25p then follows from the application of Lyapounov's Theorem. 

4.5 Proof of Theorem [2] 

Set 

n 

-Sn (a;) = "^Yk (x) 
fc=i 

where Yk is defined in ([26|) . and set 70 = /iq = 1- 

• Let us first consider the case a > a/{d + 4) (in which case lim„^oo (^7n) > (a — ad) /2). We set 

H^ = n^7~^/i^, and note that, since {'^n^^n) ^ ^"^ i'^ ~ ^d), we have 

fc=i fc=i ^ ^ 

n n 

= Yl (2^fc + ^ (^fc)) - I^ ((" - «^) ^7fe + o (7fc)) 
fc=i fc=i 

= {2- ^{a- ad))sn + o{sn) ■ 



(29) 



Since 2 — ^ (a — ad) > 0, it follows in particular that lim„^+oo -^n ^ — °°- Moreover, we clearly 
have limn-,+00 Hi/ H^_-^ = 1, and by ^ 



lim Hiy"Var[Yk{3 



n—>--\-oc 



Now, in view of ([281), E 



fc=i 



[in (x) 



2-(a-ad)r 



fix) / i^2(^)^^_ 



O (n^ Tfc^fc ) ^'^'-^' since lim„^oo ("-Tn) > (a — ad) /2, the 



application of Lemma [2] and of (|29p gives 



J^5;E(|i7„n(x)|=^) = op^5:n-373/, 

^ k=l \ ^ fc=l 



-2d 



\ riyjn ^-^ 



ikK 



3/2 



n\/n 



2M-1 



o{MH~J)] 
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The application of Theorem 1 in Mokkadem and Pelletier (2007b) then ensures that, with proba- 
bihty one, the sequence 



21nlnfi?-2^ 



^ln^K{U{x)-W.{U{x))) 

2hiln(i7-2) 



is relatively compact and its limit set is the interval 



/(^) 



■ / K'^{z)dz, 

jRd. 



fix) 



K^ (z) dz 



(30) 



y 2 — (q — ad) ^ J^d y 2 — (a — ad) ^ 

In view of ()29p . we have lim„_>oolnln ^H"'^) /lns„ = 1. It follows that, with probability one, the 



sequence I y^fn h^{fn{x) — E (/„ (x))) /v^2Trrs^ ] is relatively compact, and its limit set is the 

interval given in ()30p . The application of ([Gj) (respectively ([7|)) concludes the proof of Theorem [2] 
in the case a = a/{d + 4) (respectively a > a/{d + 4)). 

• Let us now consider the case a < a/{d + 4) (in which case lim„^oo ("-Tn) > 2a). Set H~^ = 
n~^^^ (in In (n~^/i^)) , and note that, since {h~*) G QS (4a), we have 



In {U-^hi) = -21n(n„)+ln m 



n ,-4 ^ 



\k=l k 



fc=l fc=l ^ V / / 

n n 

Y^ (27fc + o (7fe)) - Y^ {4a^-fk + o (7^)) 



fc=i 



fc=i 



(2 - 4aO Sn + o (s„) . 



(31) 



Since 2 — 4a^ > 0, it follows in particular that lim„^oon~^/i^ = 00, and thus lim„^oo -f^n ^ — °°- 
Moreover, we clearly have lim„_,oo H^/H^-i = 1- Set e e ]0, a — (d + 4)a[ such that lim„_>oo (^7n) > 
2a + e/2; in view of (p7|) . and applying Lemma [2l we get 



HlYy^^lYkix)] = 0[ulh-^lnln{U-^hi)Y 



" TT-2,,2 



n^T.^ 



fc=i 



k=l "-fc 



O UlK^ In In (n-2/i4) ^ ufjko {htk-^) 



k=l 



0(1). 



23 



Moreover, applying ([28|) . Lemma [2l and ([3T]) . we obtain 



^ k=i \ ^ \fe=i 



2d 



liih 



^ [^ [inin(n-^/^^)]^ (^En.TS.o(/^| 



^^^^U;M[\nln{U-'ht)f^ 



'2M-1 



= o([ln(if-2)] 

The application of Theorem 1 in Mokkadem and Pelletier (2007b) then ensures that, with proba- 
bility one. 



lim , ''"■ " ^^^ = hm K^ ^ . (/„ (x) - E (/„ (x))) = 0. 

""-^^ ^2\n\n{Hn') ""^^ ^2 In In (if^^^) 

Noting that ([311) ensures that lim„_»oolnln {H~'^) /In In (n~^/i^) = 1, we deduce that 

lim /i-2 [r„ (x) - E (r„ (x))] = a.s., 

n— »oo 

and Theorem [2] in the case a < a/((i + 4) follows from ([6]). 

4.6 Proof of Corollaries [TMl 

In view of ([8]), to minimize the variance of fn, the stepsize (7^) must belong to QS (—1) and satisfy 
lim„_»oo^7n = 7o £]0, c>o[. For such a choice, | = 7(7 , so that ([8]) can be rewritten as 

The function 70 1— > 70 [2 — (1 — ad) 7(7 ] reaching its minimum at the point 70 = 1 — ad. Corollary 
[1] follows. 

Let us now prove Corollary [H When lim„^oo ^7n = 7o > and lim.„^oo nhf^ = 0, the first part 
of Theorem [T] ensures that 

Proposition [T] ensuring the consistency of /n. Corollary H] follows. 

We now show how Corollary [2] can be deduced from Proposition [TJ Corollary [3] is deduced from 
Proposition [2] exactly in the same way, so that its proof is omitted. Set 

^2(0 = ^V— ^7^^^)/ K\z)dz. 
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The application of Proposition [T] ensures that 



MSE 



Ci{0ht + C2i0jnh-'' + o{hi + jnh-'') if a = a/{d + A), (32) 



if a < 


a/{d 


+ 4), 


if a = 


a/{d 


+ 4), 


if a > 


a/{d 


+ 4). 



SetaG]l/2,l]. Ifa = a/(d + 4), (Ci (C) /i^ + Ca (0 Tn/i;^"^) G ^5 (-4Q/(d + 4)). lfa<a/{d + A), 
{hf) G ^5 (-4a) with -4a > -4a/(d + 4), and, if a > a/{d + 4), {^inK'^) ^ OS {-a + ad) with 
— a + ad > — 4a/((i + 4). It follows that, for a given a, to minimize the MSE of /n, the parameter a 
must be chosen equal to a/{d+4). Moreover, in view of (f32|) . the parameter a must be chosen equal 
to 1. In other words, to minimize the MSE of /„, the stepsize (7^) must be chosen in QS (—1), the 
bandwidth (/i„) in QS (— l/(d + 4)) (and, in view of {A2)ni), the condition lim„^oo ^7n > 2/((i+4) 
must be fulfilled). For this choice of stepsize and bandwidth, set £„ = ^7n and £„ = n^'^'^'^^'hn. 
The MSE of fn can then be rewritten as 



MSE = n d+i 



CiiOCt + C2i0^nC-'' [1 + 0(1)]. 



Now, set £,„. Since the function x h-> Ci (^) x^ + C2{C) CnX '^ reaches its minimum at the 

point ((iC2 (C) >Cn/ [4Ci (^)]) '^ % to minimise the MSE of /„, £„ must be chosen equal to 

{dC2{i) Cnl [ACi{0]f'^'^^^\ that is, {K) must equal (dCa (0 / [4Ci (0] 7n)^/^'^+^^ For such a 
choice, the MSE of /„ can be rewritten as 

d_ 

MSE = n-^^ci~' (^J '^' ^ [Ci (e)]^ [C2 (e)]^ [1 + 0(1)] . 

It follows that to minimize the MSE of /„, the limit of £„ (that is, of wyn) niust be finite (and 
larger than 2/{d + 4)). Now, set 70 > 2/(d + 4) and £„ = 7o(5n with lim„^oo (^n = 1 (so that 
lim„^oo "'Tn = 7o)- In this case, we have C = To" ' 



4(. 



70- 



d+ij 



^2(6 = 57-?taC2 , C2 = /(x)/j,,if2(^)^^^ 



2C 



70 



d+4; 



and the MSE of /„ can be rewritten as 

M5E = n-4i5„^^±±^ i_^_cf^c|^[l+o(l)]. 

The function x 1-^ x"^ / {x — 2/{d + A)y ''^ ' reaching its minimum at the point x = 1, to 
minimize the MSE of /„, 70 must be chosen equal to 1. Corollary [2] follows. 
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